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Abstract 

Describing a problem using classical linear algebra is a very well- 
known problem-solving technique. If your question can be formu- 
lated as a question about real or complex matrices, then the answer 
can often be found by standard techniques. 

It's less well-known that very similar techniques still apply 
where instead of real or complex numbers we have a closed semir- 
ing, which is a structure with some analogue of addition and multi- 
plication that need not support subtraction or division. 

We define a typeclass in Haskell for describing closed semir- 
ings, and implement a few functions for manipulating matrices and 
polynomials over them. We then show how these functions can 
be used to calculate transitive closures, find shortest or longest 
or widest paths in a graph, analyse the data flow of imperative 
programs, optimally pack knapsacks, and perform discrete event 
simulations, all by just providing an appropriate underlying closed 
semiring. 

Categories and Subject Descriptors D.l.l [Programming Tech- 
niques]: Applicative (Functional) Programming; G.2.2 [Discrete 
Mathematics]: Graph Theory — graph algorithms 

Keywords closed semirings; transitive closure; linear systems; 
shortest paths 

1. Introduction 

Linear algebra provides an incredibly powerful problem-solving 
toolbox. A great many problems in computer graphics and vision, 
machine learning, signal processing and many other areas can be 
solved by simply expressing the problem as a system of linear 
equations and solving using standard techniques. 

Linear algebra is defined abstractly in terms of fields, of which 
the real and complex numbers are the most familiar examples. 
Fields are sets equipped with some notion of addition and multi- 
plication as well as negation and reciprocals. 

Many discrete mathematical structures commonly encountered 
in computer science do not have sensible notions of negation. 
Booleans, sets, graphs, regular expressions, imperative programs, 
datatypes and various other structures can all be given natural no- 
tions of product (interpreted variously as intersection, sequencing 
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or conjunction) and sum (union, choice or disjunction), but gener- 
ally lack negation or reciprocals. 

Such structures, having addition and multiplication (which dis- 
tribute in the usual way) but not in general negation or reciprocals, 
are called semirings. Many structures specifying sequential actions 
can be thought of as semirings, with multiplication as sequencing 
and addition as choice. The distributive law then states, intuitively, 
a followed by a choice between b and c is the same as a choice 
between a followed by b and a followed by c. 

Plain semirings are a very weak structure. We can find many 
examples of them in the wild, but unlike fields which provide 
the toolbox of linear algebra, there isn't much we can do with 
something knowing only that it is a semiring. 

However, we can build some useful tools by introducing the 
closed semiring, which is a semiring equipped with an extra opera- 
tion called closure. With the intuition of multiplication as sequenc- 
ing and addition as choice, closure can be interpreted as iteration. 
As we see in the following sections, it is possible to use something 
akin to Gaussian elimination on an arbitrary closed semiring, giv- 
ing us a means of solving certain "linear" equations over any struc- 
ture with suitable notions of sequencing, choice and iteration. First, 
though, we need to define the notion of semiring more precisely. 

2. Semirings 

We define a semiring formally as consisting of a set R, two distin- 
guished elements of R named 0 and 1, and two binary operations 
+ and •, satisfying the following relations for any a,b,c G R: 



a + b = 


b + a 


a + (b + c) = 


(a + b) + c 


0 + 0 = 


a 


a ■ (b ■ c) = 


(a ■ b) ■ c 


a ■ 0 = 


0 • a = 0 


a- 1 = 


1 ■ a — a 


a - (b + c) = 


a ■ b + a ■ c 


(a + b) ■ c = 


a ■ c + b ■ c 



We often write a ■ b as ab, and a ■ a ■ a as a 3 . 

Our focus will be on closed semirings [12], which are semir- 
ings with an additional operation called closure (denoted *) which 
satisfies the axiom: 

a* = 1 + a ■ a* — 1 + a* ■ a 

If we have an affine map x v- > ax + b in some closed semiring, 
then x — a*b is a fixpoint, since a*b — (aa* + l)b — a(a*b) + b. 
So, a closed semiring can also be thought of as a semiring where 
affine maps have fixpoints. 

The definition of a semiring translates neatly to Haskell: 
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infixl 9 8. 
infixl 8 <S+ 

class Semiring r where 
zero , one : : r 
closure : : r -> r 
(@+) , (8. ) : : r -> r -> r 

There are many useful examples of closed semirings, the sim- 
plest of which is the Boolean datatype: 

instance Semiring Bool where 
zero = False 
one = True 
closure x = True 
«S+) = ( I I ) 

(a.) = (&&) 

It is straightforward to show that the semiring axioms are satis- 
fied by this definition. 

In semirings where summing an infinite series makes sense, we 
can define a* as: 

1 + a + a 2 +a 3 + ... 

since this series satisfies the axiom a* — 1 + a ■ a*. In other 
semirings where subtraction and reciprocals make sense we can 
define a* as (1 — a) ~ 1 . Both of these viewpoints will be useful to 
describe certain semirings. 

The real numbers form a semiring with the usual addition and 
multiplication, where a* = (1 — a) -1 . Under this definition, 1* is 
undefined, an annoyance which can be remedied by adding an extra 
element oo to the semiring, and setting 1* = oo. 

The regular languages form a closed semiring where • is con- 
catenation, + is union, and * is the Kleene star. Here the infinite 
geometric series interpretation of * is the most natural: a* is the 
union of a n for all n. 

3. Matrices and reachability 

Given a directed graph G of n nodes, we can construct its adjacency 
matrix M, which is an n x n matrix of Booleans where Mij is true 
if there is an edge from i to j. 

We can add such matrices. Using the Boolean semiring's defi- 
nition of addition (i.e. disjunction), the effect of this is to take the 
union of two sets of edges. 

Similarly, we define matrix multiplication in the usual way, 
where (AB)ij = J^ fc Aik ■ Bkj. The product of two Boolean 
matrices A, B is thus true at indices ij if there exists any index 
k such that Aik and Bkj are both true. In particular, (M 2 )ij is true 
if there is a path with two edges in G from node i to node j. 

In general, M k represents the paths of k edges in the graph 
G. A node j is reachable from a node i if there is a path with any 
number of edges (including 0) from i to j . This reachability relation 
can therefore be described by the following, where I is the identity 
matrix: 

I + M + M 2 + M' A + ... 

This looks like the infinite series definition of closure from 
above. Indeed, suppose we could calculate the closure of M, that 
is, a matrix M* such that: 

M* = I + M ■ M* 

M* would include the paths of length 0 (the / term), and would 
be transitively closed (the M ■ M* term). So, if we can show that 
n x n matrices of Booleans form a closed semiring, then we can 
use the closure operation to calculate reachability in a graph, or 
equivalently the reflexive transitive closure of a graph. 

Remarkably, for any closed semiring R, the n x n matrices 
of elements of R form a closed semiring. This is a surprisingly 
powerful result: as we see in the following sections, the closure 



operation can be used to solve several different problems with a 
suitable choice of the semiring R. 

We define addition and multiplication ofnxn matrices in the 
usual way, where: 

(A + B)ij = Aij + Bij 

n 

(A ■ B)ij — Aik ■ Bkj 
k=i 

The matrix 0 is the n x n matrix where every element is the under- 
lying semiring's 0, and the matrix 1 has the underlying semiring's 
1 along the main diagonal (so la — 1) and 0 elsewhere. 

In Haskell, we use the type Matrix, which represents a matrix 
as a list of rows, each a list of elements, with a special case 
for the representation of scalar matrices (matrices which are zero 
everywhere but the main diagonal, and equal at all points along the 
diagonal). This special case allows us to define matrices zero and 
one without knowing the size of the matrix. 

data Matrix a = Scalar a 

I Matrix [[a]] 

To add a scalar to a matrix, we need to be able to move along 
the main diagonal of the matrix. To make this easier, we introduce 
some helper functions for dealing with block matrices. 

A block matrix is a matrix that has been partitioned into several 
smaller matrices. We define a type for matrices that have been 
partitioned into four blocks: 

type BlockMatrix a = (Matrix a, Matrix a, 
Matrix a, Matrix a) 

If a, b, c and d represent the n x n matrices A, B, C, D, then 
BlockMatrix (a,b,c,d) represents the In x 2n block matrix: 



Joining the components of a block matrix into a single matrix is 
straightforward: 

mjoin : : BlockMatrix a -> Matrix a 
mjoin (Matrix a, Matrix b, 

Matrix c, Matrix d) = 
Matrix ((a 'heat' b) ++ (c 'heat' d)) 
where heat = zipWith (++) 

For any n x m matrix where n, m > 2, we can split the matrix 
into a block matrix by peeling off the first row and column: 

msplit : : Matrix a -> BlockMatrix a 
msplit (Matrix (row: rows)) = 

(Matrix [[first]], Matrix [top], 
Matrix left, Matrix rest) 

where 

(first :top) = row 

(left, rest) = unzip (map (\(x:xs) -> ([x],xs)) 

rows) 

Armed with these, we can start implementing a Semiring in- 
stance for Matrix. 

instance Semiring a => Semiring (Matrix a) where 
zero = Scalar zero 
one = Scalar one 

Scalar a @+ Scalar b = Scalar (a 3+ b) 

Matrix a <S+ Matrix b = 

Matrix (zipWith (zipWith (@+)) a b) 

Scalar s 0+ m = m 0+ Scalar s 

Matrix [[a]] (8+ Scalar b = Matrix [[a (3+ b]] 
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m 8+ s = mjoin (first 8+ s, top, 

left, rest 8+ s) 

where (first, top, 

left, rest) = msplit m 

Scalar a 8. Scalar b = Scalar (a 8. b) 
Scalar a 8. Matrix b = Matrix (map (map (a 0.)) b) 
Matrix a a. Scalar b = Matrix (map (map (a. b)) a) 
Matrix a a. Matrix b = 

Matrix [[foldll (3+) (zipWith (a.) row col) 
I col <- cols] I row <- a] 

where cols = transpose b 

Defining closure for matrices is trickier. Lehmann [12] gave 
a definition of M* for an arbitrary matrix M which satisfies the 
axioms of a closed semiring, and two algorithms for calculating it. 
The first of these generalises the Floyd-Warshall algorithm for all- 
pairs shortest paths [6], while the second is a semiring-flavoured 
form of Gaussian elimination. 

Both are specified imperatively, via indexing and mutation of 
matrices represented as arrays. However, an elegant functional im- 
plementation can be derived almost directly from a lemma used to 
prove the correctness of the imperative algorithms. Given a block 
matrix 

— (5 2) 

Lehmann shows that its closure M* will satisfy 

*** _ f A * + B'A*C B'A*\ 
y A*C A* J 

where B' = A*B, C = CA* and A = D + CA'B. The closure 
of a 1 x 1 matrix is easily calculated since (a) = (a*), so this 
leads directly to an implementation of closure for matrices: 

closure (Matrix [[x]]) = Matrix [[closure x]] 
closure m = mjoin 

(first' a+ top' S. rest' a. left', top' a. rest', 
rest' a. left', rest') 

where 

(first, top, left, rest) = msplit m 

first' = closure first 

top' = first' a. top 

left' = left a. first' 

rest' = closure (rest 8+ left' a. top) 

Multiplying a p x q matrix by a q x r matrix takes 0(pqr) 
operations from the underlying semiring. The closure function, 
when given a n x n matrix, does 0{n ) semiring operations via 
matrix multiplication (by multiplying 1 x n and n x n matrices, 
or n x 1 and 1 x n), plus 0(n 2 ) semiring operations via matrix 
addition and msplit, plus one recursive call. 

The recursive call to closure is passed a (n — 1) x (n — 1) 
matrix, and so the total number of semiring operations done by 
closure for an n x n matrix is 0(n 3 ). Thus, closure has the 
same complexity as calculating transitive closure using the Floyd- 
Warshall algorithm. 

However, since it processes the entire graph and always pro- 
duces an answer for all pairs of nodes, it is slower than standard al- 
gorithms for checking reachability between a single pair of nodes. 

4. Graphs and paths 

We've already seen that the reflexive transitive closure of a graph 
can be found using the above closure function, but it seems like 
a lot of work just to define reachability! However, choosing a 
richer underlying semiring allows us to calculate more interesting 
properties of the graph, all with the same closure algorithm. 



The tropical semiring (more sensibly known as the min-plus 
semiring) has as its underlying set the nonnegative integers aug- 
mented with an extra element oo, and defines its + and ■ opera- 
tors as min and addition respectively. This semiring describes the 
length of the shortest path in a graph: ab is interpreted as a path 
through a and then b (so we sum the distances), and a + b is a 
choice between a path through a and a path through b (so we pick 
the shorter one). We express this in Haskell as follows, using the 
value Unreachable to represent oo: 

data ShortestDistance = Distance Int I Unreachable 
instance Semiring ShortestDistance where 

zero = Unreachable 

one = Distance 0 

closure x = one 

x 3+ Unreachable = x 
Unreachable a+ x = x 

Distance a a+ Distance b = Distance (min a b) 

x a. Unreachable = Unreachable 
Unreachable a. x = Unreachable 
Distance a a. Distance b = Distance (a + b) 

For a directed graph with edge lengths, we make a matrix 
M such that Mij is the length of the edge from i to j, or 
Unreachable if there is none. M is represented in Haskell with 
the type Matrix ShortestDistance, and calling closure cal- 
culates the length of the shortest path between any two nodes. 

To see this, we can appeal again to the infinite series view of 
closure: (M k )ij is the length of the shortest path with k edges from 
node i to node j, and M* is the sum (which in this semiring means 
"minimum") of M k for any k. Thus, (M*)ij is the length of the 
shortest path with any number of edges from node i to node j. 

Often we're interested in finding the actual shortest path, not just 
its length. We can define another semiring that keeps track of this 
data, where paths are represented as lists of edges, each represented 
as a pair of nodes. 

There may not be a unique shortest path. If we are faced with a 
choice between two equally short paths, we must either have some 
means of disambiguating them or be prepared to return multiple 
results. In the following implementation, we choose the former: we 
assume nodes are ordered and choose the lexicographically least of 
multiple equally short paths. 

data ShortestPath n = Path Int [(n,n)] I NoPath 
instance Ord n => Semiring (ShortestPath n) where 

zero = NoPath 

one = Path 0 [] 

closure x = one 

x 3+ NoPath = x 
NoPath a+ x = x 
Path a p a+ Path a' p' 

I a < a' = Path a p 

I a == a' && p < p' = Path a p 

I otherwise = Path a' p' 

x a. NoPath = NoPath 
NoPath a. x = NoPath 

Path a p a. Path a' p' = Path (a + a') (p ++ p') 

The @ . operator given here isn't especially fast since ++ takes 
time linear in the length of its left argument, but this can be avoided 
by using an alternative data structure with constant-time appends 
such as difference lists. 

We construct the matrix M, where Mij is Path d [(i,j)] 
if there's an edge of length d between nodes i and j or NoPath 
if there's none. Calculating M* in this semiring will calculate not 
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only the length of the shortest path between all pairs of nodes, but 
give the actual route. 

To calculate longest paths we can use a similar construction. We 
have to be slightly more careful here, because a graph with cycles 
contains arbitrarily long paths. 

As well as nonnegative integer distances, we have two other 
possible values: LUnreachable, indicating that there is no path 
between two nodes, and LInfinite, indicating that there's an 
infinitely long path due to a cycle of positive length. This forms 
a semiring as shown: 

data LongestDistance = LDistance Int 
I LUnreachable 
I LInfinite 
instance Semiring LongestDistance where 
zero = LUnreachable 
one = LDistance 0 

closure LUnreachable = LDistance 0 
closure (LDistance 0) = LDistance 0 
closure _ = LInfinite 

x 0+ LUnreachable = x 
LUnreachable 0+ x = x 
LInfinite 0+ _ = LInfinite 
0+ LInfinite = LInfinite 
LDistance x <S+ LDistance y = LDistance (max x y) 

x 0. LUnreachable = LUnreachable 
LUnreachable 0. x = LUnreachable 
LInfinite 0. _ = LInfinite 
0. LInfinite = LInfinite 
LDistance x 0. LDistance y = LDistance (x + y) 

We can find the widest path (also known as the highest-capacity 
path) by using min instead of addition as semiring multiplication 
(to pick the narrowest of successive edges in a path). By working 
with real numbers as edge weights, interpreted as the probability of 
failure of a given edge, we can calculate the most reliable path. 

There is an intuition for closure for an arbitrary graph and an 
arbitrary semiring. Each edge of the graph is assigned an element 
of the semiring, which make up the elements of the matrix M. Any 
path (sequence of edges) is assigned the product of the elements 
on each edge, and M*j is the sum of the products assigned to 
every path from i to j. The fact that product distributes over sum 
means we can calculate this, using the above closure algorithm, 
in polynomial time. 

We can use this intuition to construct powerful graph analyses, 
simply by making an appropriate semiring and calculating the 
closure of a graph's adjacency matrix. For instance, we can make 
a semiring of subsets of nodes of a graph, where + is intersection 
and • is union. We set Mij = {i,j}, and calculate M* . Each path 
is assigned the set of nodes visited along that path, and taking the 
"sum over all paths" calculates the intersection of those sets, or the 
nodes visited along all paths. Thus, M,* gives the set of nodes that 
are visited along all paths from i to j, or in other words, the graph 
dominators of j with start node i. 

5. "Linear" equations and regular languages 

One of the sharpest tools in the toolbox of linear algebra is the use 
of matrix techniques to solve systems of linear equations. 

Since we get to specify the semiring, we define what "linear" 
means. Many problems can then be described as systems of "linear" 
equations, even though they're far from linear in the classical sense. 

Suppose we have a system of equations in some semiring on a 
set of variables xi, . . . , x„, where each Xi is defined by a linear 
combination of the variables and a constant term. That is, each 




Figure 1. A finite state machine and its matrix representation 



equation is of the following form, where a,ij and bi are givens: 

Xi = aaxi + a i2 x 2 H h 6; 

We arrange the unknowns Xi into a column vector X, the coef- 
ficients A into a square matrix A and the constants b into a column 
vector B. The system of equations now becomes: 

X — AX + B 

This equation defines X as the fixpoint of an affine map. As we 
saw in section 2, it therefore has a solution X — A* B, which can 
be calculated with our definition of closure for matrices. 

The above was a little cavalier with matrix dimensions. Techni- 
cally, our machinery for solving such equations is only defined for 
n x n matrices, not column vectors. However, we can extend the 
column vectors X and B ton x n matrices by making a matrix all 
of whose columns are equal. Solving the equation with such matri- 
ces comes to the same answer as using column vectors directly, so 
we keep working with column vectors. Happily, our Haskell code 
for manipulating matrices accepts column and row vectors without 
problems, as long as we don't try to calculate the closure of any- 
thing but a square matrix. 

If we have some system that maps input to output, where the 
mapping can be described as a linear map X M> AX + B, then the 
fixpoint X — A*B gives a "stable state" of the system. 

As we see below, Kleene's proof that all finite state machines 
accept a regular language and the McNaughton-Yamada algo- 
rithm [15] for constructing a regular expression to describe a state 
machine can also be described as such "linear" systems. 

Given a description of a finite state machine, we can write down 
a regular grammar describing the language it accepts. For every 
transition qA A- <?s, we have a grammar production A — > xB, and 
for every accepting state qA we have a production A — > e. 

We can group these productions by their left-hand sides to give a 
system of equations. For instance, in the machine of Figure 1 there 
is a state q B with transitions q B A qA and q B qc, so we get 
the equation: 

B = yA + zC 

In the semiring of regular languages (where addition is union, 
multiplication is concatenation, and closure is the Kleene star), 
these are all linear equations. For an n-state machine, we define 
the n x n matrix M where Mjj is the symbol on the transition 
from the ith state to the jth, or 0 (the empty language) if there is 
no such transition. The vector A is constructed by setting Ai to 
1 (the language containing only e) when the ith state is accepting, 
and 0 otherwise. The languages are represented by the vector of 
unknowns L, where Li is the language accepted starting in the ith 
state. For our example machine, M and A are shown at the right of 
Figure 1. 

Then, the regular grammar is described by: 
L = M L + A 
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This equation has a solution given by L = M* • A. We can use 
our existing closure function to solve these equations and build 
a regular expression that describes the language accepted by the 
finite state machine. 

In Haskell, we define a "free" semiring which simply records 
the syntax tree of semiring expressions. To qualify as a semiring 
we must have a @+ b == b @+ a, a @ . one == a, and so on. 
We sidestep this by cheating: we don't define an Eq instance for 
FreeSemiring, and consider two FreeSemiring values equal if 
they are equal according to the semiring laws. However, to make 
our FreeSemiring values more compact, we do implement certain 
simplifications like Cte = 0. 

data FreeSemiring gen = 
Zero 
One 

Gen gen 

Closure (FreeSemiring gen) 
(FreeSemiring gen) :S+ (FreeSemiring gen) 
(FreeSemiring gen) :9. (FreeSemiring gen) 

instance Semiring (FreeSemiring gen) where 
zero = Zero 
one = One 

Zero 9+ x = x 
x 9+ Zero = x 
x 9+ y = x :9+ y 

Zero 9. x = Zero 
x 9. Zero = Zero 
One 9. x = x 
x 9. One = x 
x 9. y = x :9. y 

closure Zero = One 
closure x = Closure x 

If we construct M as a Matrix (FreeSemiring Char), then 
calculating M* • A will give us a vector of FreeSemiring Char, 
each element of which can be interpreted as a regular expression 
describing the language accepted from a particular state. 

For the example of Figure 1, closure then tells us that the lan- 
guage accepted with state A as the starting state is x(yx)*z. This 
algorithm produces a regular expression that accurately describes 
the language accepted by a given state machine, but it is not in gen- 
eral the shortest such expression. 

6. Dataflow analysis 

Many program analyses and optimisations can be computed by 
dataflow analysis. As an example, we consider the classical live- 
ness analysis, which computes which assignments in an impera- 
tive program assign values which will never be read ("dead"), and 
which ones may be used again ("live"). 

We construct the program's control flow graph by dividing it 
into control-free basic blocks and with edges indicating where there 
are jumps between blocks. For a given basic block b, the set of 
variables live at the start of the block (IN;,) are those used by the 
basic block itself (USEb) before their first definition, and those 
which are live at the end of the block (OUTb) but not assigned 
a new value during the block (DEFb). The variables live at the end 
of a basic block are those live at the start of any successor. 

This gives a system of equations: 



:= 1 
:= 0 



A 



IN 6 

OUT ft 



(OUT b n DEF b ) U USEf, 
U IN 6 , 



while x < y : 

x := x * 2 
z := z + 1 

return x 



x := 1 
z := 0 




I ^ 




while x < y: 






x := x * 2 
z := z + 1 


return x 





Figure 2. A simple imperative program to calculate the smallest 
power of two greater than the input y, and its control flow graph. 
The variable z does not affect the output. 



An example program is given in Figure 2, where DEF and USE 
are as follows: 

DEF a = {x, z} 
DEF S = 0 
DEF C = {x, z} 
DEF D = 0 

If we solve for INb and OUT;,, we find that z is not live upon 
entry to D (that is, z ^ INd). However, z is considered live on 
entry to C, even though it is never affects the output of the program. 
We see how to remedy this using faint variables analysis below, 
but first we show how the classical live variables analysis can be 
calculated using our semiring machinery. 

We define a semiring of sets of variables, where 0 is the empty 
set, 1 is the set of all variables in the program, + is union, ■ is 
intersection, and x* = 1 for all sets x. Our system of equations can 
be represented as follows in this semiring: 



USEa 


= 0 


USE S 


= {x,y} 


USE C 


= {x,z} 


USE D 





OUTb 



X ™ b , 

b'esucc(fr) 

X DEF;,/ 

b' Gsucc(Jj) 



• OUT 6 < + USE 6 



X DEF b' ■ OUT b 

V b' Gsucc(6J 



X USE f 

i b' Gsucc(b) 



b' £succ(b) 



This is a system of affine equations over the variables OUTb, 
with coefficients DEFy and constant terms Sb'esuccfb) USEb'. 
As before, we can solve it by building a matrix M containing the 
coefficients and a column vector A containing the constant terms. 
The solution vector OUTb is given by M* ■ A, using the same 
closure algorithm. 

Dataflow analyses can be treated more generally by studying 
the transfer functions of each basic block. We consider backwards 
analyses (like liveness analysis), where the transfer functions spec- 
ify INb given OUTb (the discussion applies equally well to for- 
wards analyses, with suitable relabelling). 

The equations have the following form: 

INb = fb (OUTb) 
OUTb = join INb' 

b' Gsucc(b) 

or, more compactly, 

OUTb = join /(OUT,,,) 

b' £succ(b) 

For many standard analyses, we can define a semiring where ft 
is linear, and join is summation. This semiring is often the semiring 
of sets of variables (as above) or expressions, or its dual where + 
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is intersection, • is union, 0 is the entire set of variables and 1 is the 
empty set. 

Such analyses are often referred to as bit- vector problems [11], 
as the sets can be represented efficiently using bitwise operations. 

The available expressions analysis can be expressed as a linear 
system using this latter semiring, where the set of available expres- 
sions at the start of a block is the intersection of the sets of available 
expressions at the end of each predecessor. 

Other analyses don't have such a simple structure. The faint 
variables analysis is an extension of live variables analysis which 
can detect that certain assignments are useless even though they 
are considered "live" by the standard analysis. For instance, in 
Figure 2, the variable z was considered live at the start of block 
C, even though all statements involving z can be deleted without 
affecting the meaning of the program. Live variable analysis will 
consider x "live" since it may be used on the next iteration. Faint 
variable analysis finds the strongly live variables: those used to 
compute a value which is not dead. 

We can write transfer functions for faint variables analysis, 
which when given OUT 6 compute IN 6 . For instance, in our ex- 
ample x £ INc if x € OUTc (since x is used to compute the new 
value of x), and similarly z G INc if z € OUTc- 

These transfer functions don't fall into the class of bitvector 
problems, and so our previous tactic won't work directly. However, 
they are in the more general class of distributive dataflow prob- 
lems [10]: they have the property that f b (AuB) = f b (A)Uf b (B). 
Happily, such transfer functions form a semiring. 

We define a datatype to describe such functions which distribute 
over set union. For more generality, we consider an arbitrary com- 
mutative monoid instead of limiting ourselves to set union. Haskell 
has a standard definition of monoids, but they are not generally 
required to be commutative. We define the typeclass Commutative- 
Monoid for those monoids which are commutative. It has no meth- 
ods and instances are trivial; it serves only as a marker by which 
the programmer can certify his monoid does commute. 

class Monoid m => CommutativeMonoid m 
instance Ord a => CommutativeMonoid (Set a) 

With that done, we can define the semiring of transfer functions: 

newtype Transfer m = Transfer (m -> m) 
instance (Eq m, CommutativeMonoid m) => 
Semiring (Transfer m) where 
zero = Transfer (const mempty) 
one = Transfer id 
Transfer f @+ Transfer g = 

Transfer (\x -> f x 'mappend' g x) 
Transfer f 0. Transfer g = Transfer (f . g) 

Multiplication in this semiring is composition and addition is 
pointwise mappend, which is union in the case of sets. The dis- 
tributive law is satisfied assuming all of the transfer functions them- 
selves distribute over mappend. 

The closure of a transfer function is a function /* such that 
/* = ! + /•/*■ When applied to an argument, we expect that 
f*(x) = x + f(f*(x)). The closure can be defined as a fixpoint 
iteration, which will give a suitable answer if it converges: 

closure (Transfer f) = 

Transfer (\x -> fixpoint (\y -> x 'mappend' f y) x) 
where fixpoint f init = 

if init == next 
then init 

else fixpoint f next 
where next = f init 

Convergence of this fixpoint iteration is not automatically guar- 
anteed. However, it always converges when the transfer functions 
and the monoid operation are monotonic increasing in an order of 



finite height (such as the set of variables in a program), so this gives 
us a valid definition of closure for our transfer functions. 

We can then calculate M *, where M is the matrix of basic block 
transfer functions. M* gives us their "transitive closure", which 
are the transfer functions of the program as a whole. Calling these 
functions with a trivial input (say, the empty set of variables in the 
case of faint variable analysis) allows us to generate the solution to 
the dataflow equations. 

7. Polynomials, power series and knapsacks 

Given any semiring R, we can define the semiring of polynomials 
in one variable x whose coefficients are drawn from R, which is 
written R[x]. We represent polynomials as a list of coefficients, 
where the ith element of the list represents the coefficient of x l . 
Thus, 3 + 4x 2 is represented as the list [3, 0, 4]. 

We can start defining an instance of Semiring for such lists. 
The zero and unit polynomials are given by (where the one on the 
right-hand-side refers to r's one): 

instance Semiring r => Semiring [r] where 
zero = [] 
one = [one] 

Addition is fairly straightforward: we add corresponding co- 
efficients. If one list is shorter than the other, it's considered to 
be padding with zeros to the correct length (since 1 + 2x = 
1 + 2x + Ox 2 + Ox* + . . . ). 

[] @+ y = y 

x a+ [] = x 

(x:xs) @+ (y:ys) = (x @+ y) : (xs 0+ ys) 

The head of the list representation of a polynomial is the con- 
stant term, and the tail is the sum of all terms divisible by x. So, the 
Haskell value a : p corresponds to the polynomial a + px, where p 
is itself a polynomial. Multiplying two of these gives us: 

(a + px) (b + qx) = ab + (aq + pb + pqx)x 

This directly translates into an implementation of polynomial mul- 
tiplication: 

□ a. _ = [] 
_ a. □ = [] 

(a:p) a. (b:q) = (a a. b) : (map (a a.) q 8+ 

map (@. b) p a+ 
(zero: (p 3. q) ) ) 

If we multiply a polynomial with coefficients at (that is, the 
polynomial ~^ Ji a i x l ) by one with coefficients b t resulting in the 
polynomial with coefficients a, then the coefficients are related by: 



n 




This is the discrete convolution of two sequences. Our definition 
of @ . is in fact a pleasantly index-free definition of convolution. 

In order to give a valid definition of Semiring, we must define 
the operation s* such that s* = 1 + s ■ s* . This seems impossible: 
for instance, there is no polynomial p such that p = 1 + xp, since 
the degrees of both sides don't match. 

To form a closed semiring, we need to generalise somewhat and 
consider not just polynomials, but arbitrary formal power series, 
which are polynomials which may be infinite, giving us the semir- 
mgR[[x]]. 

Our power series are purely formal, representing sequences of 
elements from a closed semiring. We have no notion of "evaluat- 
ing" such a series by specifying x. We think of the formal power 
series 1 + x + x 2 + x A . . . as the sequence 1,1,1,..., and require 
no infinite sums, limits or convergence. As such, "multiplication 
by x" simply means "shifting the series by one place", and we can 
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write pxq = pqx (but not pqx — qpx) even when the underlying 
semiring does not have commutative multiplication. 

Since Haskell's lists are lazily defined and may be infinite, 
our existing definitions for addition and multiplication work for 
such power series, as demonstrated by Mcllroy in his functional 
pearl [14]. 

Given a power series s = a + px, we seek its closure s* = 
b + qx, such that s* = 1 + s • s* : 

b + qx = 1 + (a + px) (b + qx) 

= 1 + ab + aqx + p(b + qx)x 

The constant terms must satisfy b = 1 + ab, so a solution is 
given by b = a*. The other terms must satisfy q — aq + ps* . This 
states that q is the fixpoint of an affine map, so a solution is given 
by q = a*ps* and thus s* = a*(l + ps*). This translates neatly 
into lazy Haskell as follows: 

closure [] = one 
closure (a:p) = r 

where r = [closure a] S. (one:(p @. r)) 

This allows us to solve affine equations of the form x — bx + c, 
where the unknown x and the parameters b and c are all power 
series over an arbitrary semiring. This form of problem arises 
naturally in some dynamic programming problems. As an example, 
we consider the unbounded integer knapsack problem. 

We are given n types of item, with nonnegative integer weights 
wi,...,w„ and nonnegative integer values vi,...,v„. Our knap- 
sack can only hold a weight W, and we want to find out the maxi- 
mal value that we can fit in the knapsack by choosing some number 
of each item, while remaining below or equal to the weight limit W. 
This problem is NP-hard, but admits an algorithm with complexity 
polynomial in W (this is referred to as a pseudo-polynomial time 
algorithm since in general W can be exponentially larger than the 
description of the problem). 

The algorithm calculates a table t, where t(w) is the maximum 
value possible within a weight limit w. We set t(0) to 0, and for all 
other w: 

t(w) = max (vi + t(w — w,)) 

This expresses that the optimal packing of the knapsack to 
weight w can be found by looking at all of the elements which 
could be inserted and choosing the one that gives the highest value. 

The algebraic structure of this algorithm becomes more clear 
when we rewrite it using the max-plus semiring that we earlier 
used to calculate longest paths, which we implemented in Haskell 
as LongestDistance. Confusingly, in this semiring the unit ele- 
ment is the number 0, since that is the identity of the semiring's 
multiplication, which is addition of path lengths. The zero element 
of this semiring is oo, which is the identity of max. 

We take Vi and t(w) to be elements of this semiring. Then, in 
this semiring's notation, 

t(0) = 1 

w 
i=0 

We can combine the two parameters Vi and w% into a single 
polynomial V = ^ i v i x Wi . For example, suppose we fill our 
knapsack with four types of coin, of values 1, 5, 7 and 10 and 
weights 3, 6, 8 and 6 respectively. The polynomial V is given by: 

V = x 3 + 5x 6 + 7x s + 10x e 

Since we are using the max-plus semiring, this is equivalent to: 

V = x 3 + 10x 6 + 7x s 



Represented as a list, the wth element of V is the value of the 
most valuable item with weight w (which is zero if there are no 
items of weight w). Similarly, we represent t(w) as the power series 
T = t(i)x l . The list representation of T has as its wth element 
the maximal value possible within a weight limit w. 

We can now see that the definition of t(w) above is in fact the 
convolution of T and V. Together with the base case, that t(0) is 
the semiring's unit, this gives us a simpler definition of t(w): 

T = 1 + V ■ T 

The above can equally be written as T = V* , and so we get 
the following elegant solution to the unbounded integer knapsack 
problem (where ! ! is Haskell's list indexing operator): 

knapsack values maxweight = closure values ! ! maxweight 

Note that our previous intuition of x* being the infinite sum 
1 + x + x 2 + . . . applies nicely here: the solution to the integer 
knapsack problem is the maximum value attainable by choosing no 
items, or one item, or two items, and so on for any number of items. 

Instead of using the LongestDistance semiring, we can define 
LongestPath in the same way that we defined ShortestPath 
above, with max in place of min. Using this semiring, our above 
definition of knapsack still works and gives the set of elements 
chosen for the knapsack, rather than just their total value. 

8. Linear recurrences and Petri nets 

The power series semiring has another general application: it can 
express linear recurrences between variables. Since the definition 
of "linear" can be drawn from an arbitrary semiring, this is quite an 
expressive notion. 

As we are discussing functional programming, we are obliged 
by tradition to calculate the Fibonacci sequence. 

The nth term of the Fibonacci sequence is given by the sum 
of the previous two terms. We construct the formal power series 
F whose nth coefficient is the nth Fibonacci number. Multiplying 
this sequence by x k shifts along by k places, so we can rewrite the 
recurrence relation as: 

1 + xF + x 2 F = F 

This defines F = 1 + (x + x 2 )F, and so F = (x + x 2 )*. So, we 
can calculate the Fibonacci sequence as closure [0,1,1]. 

There are of course much more interesting things that can be 
described as linear recurrences and thus as formal power series. 
Cohen et al. [4] showed that a class of Petri nets known as timed 
event graphs can be described by linear recurrences in the max- 
plus semiring (the one we previously used for longest paths and 
knapsacks). 

A timed event graph consists of a set of places and a set of 
transitions, and a collection of directed edges between places and 
transitions. Atomic, indistinguishable tokens are consumed and 
produced by transitions and held in places. 

In a timed event graph, unlike a general Petri net, each place 
has exactly one input transition and exactly one output transition, as 
well as a nonnegative integer delay, which represents the processing 
time at that place. When a token enters a place, it is not eligible to 
leave until the delay has elapsed. 

When all of the input places of a transition have at least one 
token ready to leave, the transition "fires". One token is removed 
from each input place, and one token is added to each output place 
of the transition. For simplicity, we assume that transitions are 
instant, and that a token arrives at all of the output places of a 
transition as soon as one is ready to leave each of the input places. 
If desired, transition processing times can be simulated by adding 
extra places. 
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Figure 3. A timed event graph with four transitions t\,t2,t3,ti 
and five places A, B, C, D, E with delays in parentheses, where 
all but two of the places are initially empty. 



In Figure 3, the only transition which is ready to fire at time 0 
is t\. When it fires, it removes a token from B and adds one to A. 
This makes transition t2 fire at time 1, which adds a token to B 
causing ti to fire at time 2, then to fire at time 3 and so on. 

When t2 fires at times 1, 3, 5, . . . , a token is added to place D. 
The first three times this happens, £3 fires, but after that the supply 
of tokens from C is depleted. £4 fires after the tokens have waited 
in E for a delay of three steps, so £4 fires at times 4, 6 and 8. 

Simulating such a timed event graph requires calculating the 
times at which tokens arrive and leave each place. For each place 
p, we define the sequences IN(p) and OUT(p). The ith element of 
IN(p) is the time at which a token arrives into p for the ith time. 
The ith element of OUT(p) is the time at which a token becomes 
available from p for the ith time, which may be some time before 
it actually leaves p. 

In the example of Figure 3, we have: 



W(A) = 0,2,4,6.. 
IN(B) = 1,3,5,7.. 
IN(C) = — 
IN(D) = 1,3,5,7.. 
IN(E) = 1,3,5 



OUT(A) = 1,3,5,7. 
OUT(B) = 0,2,4,6. 
OUT(C) = 0,0,0 
OUT(D) = 1,3,5,7. 
OUT(£) =4,6,8 



We say that a place p' is a predecessor of p (and write p' G 
pred(p)) if the output transition of p' is the input transition of p. 
Since transitions fire instantly, a place receives a token as soon as 
all of its predecessors are ready to produce one. 



IN(p) i 



max OUT(p'); 

p' epred(p) 



Exactly when the ith token becomes available from a place 
p depends on the amount of time tokens spend processing at p, 
which we write as delay (p), and on the number of tokens ini- 
tially in p, which we write as nstart(p). The times at which the 
first nstart(p) tokens become ready to leave p are given by the 
sequence START(p), which is nondecreasing and each element 
of which is less than delay(p). In the example we assume the 
initial tokens of B and C are immediately available, so we have 
START(B) = 0 and START(C) = 0, 0, 0. 

Thus, the time that the ith token becomes available from p is 
given by: 



OUT(p), = 



fSTART(p) l i < nstart(p) 

I IN(p)i_ nstart(p) + delay(p) i > nstart(p) 



By adopting the convention that IN(p)i is — 00 when i < 0 and 
that START(p)i is —00 when i < 0 or i > nstart(p), we can 
write the above more succinctly as: 

OUT(p), = max(START(p) i ,IN(p) l _ nstart(p) +delay(p)) 



This gives a set of recurrences between the sequences: the value 
of OUT(p) depends on the previous values of IN(p). 

We now shift notation to make the semiring structure of this 
problem apparent. We return to the max-plus algebra, previously 
used for longest distances and knapsacks, where we write max 
as +, and addition as •. Instead of sequences, let's talk about 
formal power series, where the ith element of the sequence is now 
the coefficient of x 1 . With our semiring goggles on, the above 
equations now say: 



IN(p) 



E 0UT (p') 



p' Gprcd(p) 



OUT(p) = delay(p) • :r nstart(p) • IN(p) + START (p) 

We can eliminate IN(p) by substituting its definition into the 
second equation: 



OUT(p) = £ delay(p)-a; nstart(p) -OUT(p')+START(p) 

p'epred(p) 

What we're left with is a system of affine equations, where the 
unknowns, the coefficients and the constants are all formal power 
series over the max-plus semiring. 

We can solve these exactly as before. We build the matrix M 
containing all of the delay(p) • ^ nstart (p) coefficients, and the 
column vector S containing all of the START (p) sequences, and 
then calculate M* • S (which, as before, can be done with a single 
call to closure and a multiplication by S). The components of the 
resulting vector are power series; their coefficients give OUT(p) 
for each place p. 

Thus, we can simulate a timed event graph by representing 
it as a linear system and using our previously-defined semiring 
machinery. 

9. Discussion 

It turns out that very many problems can be solved with linear 
algebra, for a definition of "linear" suited to the problem at hand. 
There are surprisingly many questions that can be answered with a 
call to closure with the right Semiring instance. Even still, this 
paper barely scratches the surface of this rich theory. Much more 
can be found in books by Gondran and Minoux [9], Golan [7, 8] 
and others. 

The connections between regular languages, path problems in 
graphs, and matrix inversion have been known for some time. The 
relationship between regular languages and semirings is described 
in Conway's book [5]. Backhouse and Carre [3] used regular alge- 
bra to solve path problems (noting connections to classical linear 
algebra), and Tarjan [17] gave an algorithm for solving path prob- 
lems by a form of Gaussian elimination. 

A version of closed semiring was given by Aho, Hopcroft and 
Ullman [2], along with transitive closure and shortest path algo- 
rithms. The form of closed semiring that this paper discusses was 
given by Lehmann [12], with two algorithms for calculating the clo- 
sure of a matrix: an algorithm generalising the Floyd- Warshall all- 
pairs shortest-paths algorithm [6], and another generalising Gaus- 
sian elimination, demonstrating the equivalence of these two in 
their general form. More recently, Abdali and Saunders [1] refor- 
mulate the notion of closure of a matrix in terms of "eliminants", 
which formalise the intermediate steps of Gaussian elimination. 

The use of semirings to describe path problems in graphs is 
widespread [9, 16]. Often, the structures studied include the extra 
axiom that a + a = a, giving rise to idempotent semirings or 
dioids. Such structures can be partially ordered, and it becomes 
possible to talk about least fixed points of affine maps. These have 
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proven strong enough structures to build a variant of classical real 
analysis [13]. 

Cohen et al. [4], as well as providing the linear description 
of Petri nets we saw in section 8, go on to develop an analogue 
of classical linear systems theory in a semiring. In this theory, 
they explore semiring versions of many classical concepts, such as 
stability of a linear system and describing a system's steady-state 
as an eigenvalue of a transfer matrix. 
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